Goto

Collaborating Authors

 general approximation lower bound


A general approximation lower bound in L p norm, with applications to feed-forward neural networks

Neural Information Processing Systems

We study the fundamental limits to the expressive power of neural networks. Given two sets $F$, $G$ of real-valued functions, we first prove a general lower bound on how well functions in $F$ can be approximated in $L^p(\mu)$ norm by functions in $G$, for any $p \geq 1$ and any probability measure $\mu$. The lower bound depends on the packing number of $F$, the range of $F$, and the fat-shattering dimension of $G$. We then instantiate this bound to the case where $G$ corresponds to a piecewise-polynomial feedforward neural network, and describe in details the application to two sets $F$: Hölder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in $L^p$ norm or in sup norm, solving an open question by DeVore et al. (2021). Our proof strategy differs from the sup norm case and uses a key probability result of Mendelson (2002).


A general approximation lower bound in L p norm, with applications to feed-forward neural networks Supplementary Material

Neural Information Processing Systems

This is the appendix for "A general approximation lower bound in We provide technical details that were missing to establish Proposition 1, Theorem 1 and Corollary 1. A node u V is a predecessor of another node v V if there is a directed edge from u to v . Note that every g G is indeed [0, 1] -valued. Before proving the two properties (see below), we first conclude the proof of Proposition 1. We now prove the two properties. Proof of Property 2. Let The reverse inequality is proved similarly.B.2 Clipping can only help The next two lemmas indicate that clipping (truncature) to a known range can only help. Note that (16) implies that f (1 /ε) cP . This bound was refined for piecewise-affine activation functions.



A general approximation lower bound in L p norm, with applications to feed-forward neural networks

Neural Information Processing Systems

We study the fundamental limits to the expressive power of neural networks. Given two sets F, G of real-valued functions, we first prove a general lower bound on how well functions in F can be approximated in L p(\mu) norm by functions in G, for any p \geq 1 and any probability measure \mu . The lower bound depends on the packing number of F, the range of F, and the fat-shattering dimension of G . We then instantiate this bound to the case where G corresponds to a piecewise-polynomial feedforward neural network, and describe in details the application to two sets F: Hölder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in L p norm or in sup norm, solving an open question by DeVore et al. (2021).